Fix: message-id in postprocessor/gelf-chunking #2662

BharatKJain · 2024-09-29T14:24:42Z

Pull request

Description

Changed message-id from auto-increment ID to randomized ID in postprocessor/gelf_chunking.rs

HELP NEEDED: I have avoided adding hostname while producing message-id, I am not sure how can we handle adding hostname, please suggest.

Checklist

The RFC, if required, has been submitted and approved
Any user-facing impact of the changes is reflected in docs.tremor.rs
The code is tested
Use of unsafe code is reasoned about in a comment
Update CHANGELOG.md appropriately, recording any changes, bug fixes, or other observable changes in behavior
The performance impact of the change is measured (see below)

Performance

codecov · 2024-09-29T14:29:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.22%. Comparing base (bff8093) to head (3e4f5a1).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2662   +/-   ##
=======================================
  Coverage   91.22%   91.22%           
=======================================
  Files         309      309           
  Lines       60078    60083    +5     
=======================================
+ Hits        54805    54812    +7     
+ Misses       5273     5271    -2

Flag	Coverage Δ
e2e-command	`11.29% <0.00%> (-0.01%)`	⬇️
e2e-integration	`50.40% <0.00%> (+<0.01%)`	⬆️
e2e-unit	`12.57% <0.00%> (-0.01%)`	⬇️
e2etests	`52.73% <0.00%> (+<0.01%)`	⬆️
tremorapi	`14.51% <0.00%> (-0.01%)`	⬇️
tremorcodec	`62.66% <ø> (ø)`
tremorcommon	`63.04% <ø> (ø)`
tremorconnectors	`28.86% <0.00%> (-0.01%)`	⬇️
tremorconnectorsaws	`11.27% <0.00%> (-0.01%)`	⬇️
tremorconnectorsazure	`4.69% <0.00%> (-0.01%)`	⬇️
tremorconnectorsgcp	`25.31% <0.00%> (-0.01%)`	⬇️
tremorconnectorsobjectstorage	`0.06% <ø> (ø)`
tremorconnectorsotel	`12.58% <0.00%> (-0.01%)`	⬇️
tremorconnectorstesthelpers	`68.25% <ø> (ø)`
tremorinflux	`87.71% <ø> (ø)`
tremorinterceptor	`54.33% <100.00%> (+0.04%)`	⬆️
tremorpipeline	`31.17% <ø> (ø)`
tremorruntime	`47.25% <0.00%> (+<0.01%)`	⬆️
tremorscript	`55.12% <ø> (ø)`
tremorsystem	`5.78% <ø> (ø)`
tremorvalue	`69.55% <ø> (ø)`
unittests	`89.08% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...mor-interceptor/src/postprocessor/gelf_chunking.rs	`92.47% <100.00%> (+0.42%)`	⬆️

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bff8093...3e4f5a1. Read the comment docs.

Licenser

👍 looks reasonable nothing to prevent using this I see, great documentation too :)

two things I notice.

it would be nice to mention how message id's are generated in the docs (the //! section)
if you are up for a challenge: we try to keep all time and randomness out of tremor to allow for deterministic replays. We do this by using ingest_ns for as a random seed, and for times, that way a event that is logged with it's ingest ns can be replayed and generate the exact same out put. The random function is a good example. It'd be interesting to see this same concept re-used for this to allow repeatable yet still random message id's; one way would be to use ingest-ns instead of the current epoch (which would be nice anyway as looking uo time isn't fast), and then seed the RNG somehow (probably not with the ingest ns as that would make it useless) but perhaps with the fist n bytes of the message? or with some bytes of a hash of the message?

BharatKJain · 2024-09-30T04:44:37Z

Okay, will do.
Trying to think-out-loud, so let's say if we create a hash of a message but repetitive logs will create same message-id which is a problem because message-id has to be unique in nature, collisions can cause problems when we're decoding the message on the server side. I am not completely sure but ideally we would want to have uniqueness in the message-id to make sure that we are not breaking the server GELF-decoding.

(How it will break server-side due to collision? So message-id is a way of determining if the UDP packet is associated with already existing log or it's for a new log, when we're sending same message-id for multiple logs then server behaviour will be to merge the data-together which will end-up breaking the log)

TBH I am also trying to figure this out, please share any suggestions, am I thinking right? 😅

Licenser · 2024-10-03T13:11:34Z

Ja just the message content would not work, I'm still considering if message content + ingest_ns (nanosecond when the message was registered at tremor) would be enough, if a server produces the same log twice in the same nanosecond that'd be very odd (but not impossible) OTOH having two random generated numbers be the same is also odd (but not impossible) it would also one a more deterministic failure case "When messages with the same content arrive at exactly the same time they will get duplicated message ids" instead of "if the RNG hates you, you'll get duplicated message ids"

BharatKJain requested review from darach, Licenser and mfelsche as code owners September 29, 2024 14:24

BharatKJain force-pushed the fix-gelf-post-processor branch from 7954549 to ba74fab Compare September 29, 2024 14:26

Licenser approved these changes Sep 29, 2024

View reviewed changes

Fix: message-id in postprocessor/gelf-chunking

3e4f5a1

BharatKJain force-pushed the fix-gelf-post-processor branch from ba74fab to 3e4f5a1 Compare October 1, 2024 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: message-id in postprocessor/gelf-chunking #2662

Fix: message-id in postprocessor/gelf-chunking #2662

BharatKJain commented Sep 29, 2024

codecov bot commented Sep 29, 2024 •

edited

Loading

Licenser left a comment

BharatKJain commented Sep 30, 2024

Licenser commented Oct 3, 2024

Fix: message-id in postprocessor/gelf-chunking #2662

Are you sure you want to change the base?

Fix: message-id in postprocessor/gelf-chunking #2662

Conversation

BharatKJain commented Sep 29, 2024

Pull request

Description

Related

Checklist

Performance

codecov bot commented Sep 29, 2024 • edited Loading

Codecov Report

Licenser left a comment

Choose a reason for hiding this comment

BharatKJain commented Sep 30, 2024

Licenser commented Oct 3, 2024

codecov bot commented Sep 29, 2024 •

edited

Loading